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A constructive and unifying framework for 
zero-bit watermarking 

Teddy Furon 
Abstract 

In the watermark detection scenario, also known as zero-bit watermarking, a watermark, carrying no liidden 
message, is inserted in a piece of content. The watermark detector checks for the presence of this particular weak 
signal in received contents. The article looks at this problem from a classical detection theory point of view, but 
with side information enabled at the embedding side. This means that the watermark signal is a function of the host 
content. Our study is twofold. The first step is to design the best embedding function for a given detection function, 
and the best detection function for a given embedding function. This yields two conditions, which are mixed into one 
'fundamental' partial differential equation. It appears that many famous watermarking schemes are indeed solution 
to this 'fundamental' equation. This study thus gives birth to a constructive framework unifying solutions, so far 
perceived as very different. 

Index Terms 

Zero-bit watermarking, Pitman-Noether theorem, detection theory. 

I. Introduction 

In the past six years, side-informed embedding strategies have been shown to greatly improve watermark decoding. 
They exploit knowledge of the host signal during the construction of the watermark signal. The theory underlying 
these side-informed schemes was presented in the famous paper "Writing on Dirty paper" by M. Costa in 1983. Our 
work gives some theoretical aspect of the achievable performances when using side-information at the embedding 
side, as in Costa's correspondence, but for the watermark detection problem (a.k.a. zero-bit watermarking [1, Sect. 
2.2.3]). This surprisingly received almost no study compared to the issue of watermark decoding, although it is 
perceived as a non trivial problem [2], [3]. Some other exceptions are works from M. Miller et al. (embedding 
cone) [4], JANIS [5] and watermark detection with distortion compensated dither modulation (DC-DM) schemes 
[6]. 
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A. Motivations from the application side 

The trade-off between payload of the hidden message and robustness is a well known fact in watermarking. The 
main rationale for zero-bit watermarking is that maximum robustness that a watermarking primitive can inherently 
offer, is expected as the payload is reduced to the minimum. Here are two application scenarios where zero-bit 
watermarking might be sufficient, ie. it is not necessary to hide a message, but just the presence of a mark. 

Some copy protection platforms [7] use watermarks as flags whose presence warns compliant devices that the 
piece of content they are dealing with, is a copyrighted material. Content access and copy protection are tackled by 
cryptographic primitives. Watermarking just prevents the 'analog hole' [8]-[10]. In other words, compliant devices 
expect three kinds of content: commercial contents which are encrypted and watermarked, free contents which 
are in the clear and not watermarked, and pirated contents through the 'analog hole' which are in the clear but 
watermarked. Although most of DRM systems hide a message like a copy status, we have seen here that the 
presence of a mark is indeed sufficient. 

Copyright protection is the most famous application of watermarking. However, hiding the name of the author 
in his Work is just a fact having no legal value. In Europe, the author first must be a member of an author society, 
then he registers his Work. The only legal proof is to give evidence that the suspicious image is indeed a version 
of a Work duly registered in an author society's database. Consequently, this is a yes/no question, which can be 
solved by detecting the presence or absence of a watermark previously embedded by an author society. 

In these two applications, the presence of a watermark is not a secret, contrary to a steganographic scenario. The 
attacker obviously knows which content is watermarked. In the copy protection application, for instance, there is no 
point in attacking a personal video which is a free content, not protected neither by encryption nor by watermarking. 

B. Motivations from the scientific side 

Zero-bit watermarking is closely related to detection of weak signals in noisy environment: the watermark signal 
is embedded in a host signal, unknown to the detector. Its power is very weak compared the one of the host. 
Watermarkers resorted to classical elements of detection theory very early. This includes the use of Neyman- 
Pearson and Pitman-Noether theorems, calculus of asymptotic efficacy, LMP tests (Locally Most Powerful) [11], 
and robust statistics [12]. 

The priority was at these times to design a better detector than the classical correlation, which is only optimal 
for white host signals. To name a few, this includes the works of teams such as Q. Cheng and T. Huang [13], A. 
Briassouli and M. Strinzis [14], M. Bami et al. [15]. They assume that the host signals are drawn from a known 
pdf (probability density function), and they apply the above-mentioned classical elements of detection theory. X. 
Huang and B. Zhang relax this implicit assumption considering that the 'real' pdf of the host belongs to a given 
family of distributions [16]. Their test is designed to fairly perform for the entire family. This allows to encompass 
attacks modifying the pdf within the family. 

Another track is to see the host signal as a side information only available at the embedding. Side information 
brings huge improvements in watermark decoding. However, its use for zero-bit watermarking has received less 
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interest. Pioneer works are mostly heuristic approaches [4], [17]. More recent works use the binning principle to 
achieve zero-bit watermarking [6], [18], although J. Eggers notices that SCS (Scalar Costa Scheme) is less efficient 
for zero-bit than for positive rate watermarking scheme [19, Sect. 3.6]. Indeed, Erez et al. prove the optimaUty of 
DC-DM based on lattices (those whose Voronoi region asymptotically tends to an hypersphere) for strictly positive 
rate data hiding as far as an additive white noise attack is considered [20]. In the case of zero-rate watermarking, 
P. Moulin et al. reasonably conjecture that sparse lattice DC-DM is optimal [21]. For zero-bit watermarking, lattice 
DC-DM achieves high performances showing some host interference rejection [6]. However, there is a loss of 
efficacy compared to the private setup where the side information is also available at the detector. 

At first glance, it would seem that the problem of watermark detection is simpler than the decoding of hidden 
symbols, because the decoder's output belongs to a message space which is bigger than the detector's range 
B = {0, 1}. In other words, whereas watermark detection implies a simple binary hypotheses test, decoding of 
watermark is a complex multiple hypotheses test. 

Yet, almost no theoretical limit, ie. an equivalent of Costa's result but for watermark detection, has been shown, 
except [22, Sect. 2] which only tackles the Gaussian case. N. Merhav mentioned during the WaCha'05 workshop 
in Barcelona, that zero-bit watermarking is a hard problem whose optimal solution is not known for the moment 
[2]. Especially, up to now, there is no reason why the binning principle should be optimal, even if, as far as the 
author knows from the literature, it has the best performances against an AWGN attack. Yet, DC-DM schemes are 
known to be weak against scale gain attack. 

II. Strategy and notation 

Our goal is not to derive an accurate statistical model of the host signal as done in the above-mentioned prior 
works. On contrary, very basic assumptions (Gaussian distribution or flat-host assumption) are in order, allowing 
us to stress the major role of side information at the embedding side. While the birming scheme is conmionly used 
to exploit side information, it is not the only way. Our approach is indeed closer to the theory of weak signal 
detection. 

A. Embedding side 

The embedder transforms an original host signal s into a watermarked content y = f(s) = s-|-x. The host signal 
or channel state s is a vector of n components of the original content, modeled as random variables. The notational 
key of the article is to decompose the watermark signal x as a unit power vector w and an amplitude 6. 

f(s) =s + x = s + 6»w(s). (1) 

w is a smooth function from R" to K", with the constraints Es{w(s)} = and Es{||w(s)||^} = n. This vector 
gives a direction pointing to an acceptance region of R", towards which the host signal should be pushed. The scalar 
6 controls the gain or amplitude of the watermark signal. Theoretical frameworks often use a constant 6 = y/P, 
where P is the fixed power of x. Yet, in practice, host contents might support different watermark power depending 
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on their individual masking property. This change might even occur within a content, such that we should resort to 
a vector = (6'i, • • • , 6'„) gathering positive and small gains affecting each sample. We restrict our study to scalar 
gain for the sake of simplicity, but the results of this paper can be easily extended to a vector gain. In this case, 
one might consider 9 as the average gain. 

Both parts of the watermark signal depends on the host content, either through side information, or for some 
perceptual reasons. Unfortunately, in blind schemes, side information is not made available at the detection side. 
Moreover, we wish to maintain a low detector's complexity, which prevents the use of a human visual or auditive 
system in order to recreate an estimate of 9 based on the received content. The only fact the detector knows is 
that the watermark amplitude 9 is positive and small. We believe this model allows a great flexibility which eases 
practical implementations of watermarking schemes. 

B. Detection side 

Upon receipt of signal r, the detector makes a binary decision: d = 1 {d = Q) means that, according to the 
detector, the piece of content under scrutiny is watermarked (resp. it has not been watermarked). There are two 
hypotheses: Under hypothesis TLq, the detector receives an original content r = ro = s (see end of subsection II- Al 
for justifications), whereas under hypothesis Tii, the detector receives a watermarked and possibly attacked content 
r = ri . ProbabiUty of false alarm Pfa and power of the test Pp are given by 



Once again, in zero-bit watermarking, no symbol is transmitted. Our problem is then fundamentally different 
from the communication of one bit because, under hypothesis Tig, no processing is applied and s, given by Nature, 
is directly sent to the detector. 

We assume that the detector has the structure of a Neyman-Pearson test. First, it applies a detection function 
t mapping from M" to M. Then, this scalar is compared to a threshold t: rf = 1 if i(r) > r, d = else. The 
threshold is given by the constraint of a significance level a such that Pja = Ei:){d\Ho} < a. Note that, for a given 
detection function, this threshold does not depend on what happens under hypothesis Hi (embedding function w, 
watermark's amplitude 9). Moreover, we assume without loss of generality, that, under hypothesis Ho, t{r) is a 
centered random variable with unit variance: 



If not the case, it is easy to built the test i{r) = (<(r) - ER{t(r)|7Yo})/^Var{t(r)|7Yo}- 
C. Pitman Noether efficacy 

In this article, the tests are compared asymptotically for n — > +oo. The Pitman-Noether theorem indicates that 
the best test has the higher efficacy rj, whose general definition is given by [11, Sect. III.C.3]: 



Pf,=Pr{d=l\Ho} 



Pp = Pr{d=l|Hi}. 



(2) 



ER{t(r)|?io} = 0, 



Var{t(r)|7^o} = 1. 



(3) 




(4) 
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where m is the first integer for which the m-th derivative of ER{t(r)|7Yi} is not null, and S a positive scalar 
such that the limit is not null. In our problem, it is not unreasonable to assume m = 1 and (5 = 1/2 because the 
expectation of the detection function grows with as Var{t(r)|7^o} has been set to one for all n. This is at least 
true for well known watermarking schemes. We are not able to find a counter-example, ie. a watermarking scheme 
having a better growth rate than ^/n. Therefore, we restrict our analysis to 5 = 1/2. 

The Pitman Noether theorem holds for composite one-sided hypothesis test. In Sect. III-AI motivations clearly 
show that our problem is not a simple hypothesis test (Hq : 9 = versus Hi : 9 = VP fixed), but a composite 
one-sided hypothesis test (Hq : 9 = versus Hi : 9 > 0). 

Last but not least, the proof of this theorem is based on an asymptotic study where the alternative hypothesis 
Hi has a vanishing parameter 6'„ = kn^^, with k a positive constant. Important assumptions are the following 
regularity conditions: 



lim ll^EMr)\ni} 



= 1 and lim (Var{i(r|7i;i)}/Var{i(r)|7i;o}) = 1, 

n — >oo 

(5) 

and that t(r) — ER{i(r)} tends (convergence in law), as n ^ cx), to a normal variable, both under H\ and under 
Hq. 

We also define the efficiency per element (a.k.a. the differential detector SNR) in the same way as the efficacy 
but without the limit, such that in our case: 



1 

= - 
n 



^ER{t(r)|7^i} 



2 

(6) 

6»=0 



III. Detection of weak signal dependent on side information 

The goal of this section is to give the expressions for the best detection and the best embedding functions. We 
mean 'best' in the sense of the Pitman Noether theorem, ie. such as they maximized the efficiency per element. 

This section doesn't consider any attack. Hence, the Pitman Noether theorem considers signals rg = s and 
ri = y = s + 6'„w(s), with Es{||w(s)|p} = n and 6'„ = kj ^/n, fc > 0. It means that the proof of this theorem 
fixes the embedding distortion \a De = 9\n = k"^, but as n increases, the power of the watermarking signal 
vanishes. 

A. Best detector for a given embedding function 

In this subsection, embedding function w is fixed. A well known corollary of the Pitman Noether theorem [11, 
Sect. III.C.3] states that the Locally Most Powerful (LMP) test in = is asymptotically the best. A Cauchy- 
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Schwarz inequality gives: 



d 



— ER{t(r)|Hi} 



dr 



< 



1 d_ 
p{r\Ho) de 



9=0 



dr 



with equality for the LMP test: 



p{r\no 
t(r) = kt 



1 



d 



1 dp{v\ni) 



dr, 



(7) 
(8) 
(9) 

(10) 



p(r|Ho) 90 

where fct is a positive constant whose role is explained below. The use of the LMP with = is reinforced in 
practice by the fact the watermark power is very weak compared to the host power. 

When there is no attack, p(r|7io) = Ps(r) and p(r|?ii) ~ PY(r). We assume there exists ^ > 0, such that function 
f(s) is invertible at least when < ^' < ^: s = f^^(y). This allows to write pY(r) — ?'s(f^^(r))| Jf-i (r)|, with 
the last term being the determinant of the Jacobian matrix of f taken at (r, 9). Developing this last equation (see 
Appendix IJl, we finally get these expressions: 



t{v) 



= -kt 

= -h 



Ps(r) 
div(ps(i')w(r)) 
Ps(r) 



w{y) — fc(div(v^r(r)) 



(11) 
(12) 



The first term of (fTTT i corresponds to the classical non-linear correlation based LMP test [13]-[15], whereas the 
second term is not null whenever side information is enabled at the embedding side. 

Let Bn{R) be the ball of radius R centered on 0, Sn{R) the associated hypersphere, and E{R) ~ j.^^ t{r)ps {r)dr. 
Then, thanks to the Gauss theorem, we have 



\EiR)\ = kt 

= kt 
< kt 



div(ps(i")w(r))(ir 



ps(i")w(r)"^e(r)(ir 



Ps(r)||w(r)||dr, 



(13) 
(14) 
(15) 



with e(r) the unit normal vector at position r on i?{||w(r)|p} < oo implies that limji^ E{R) = 0. 

This shows that the expectation of the detection function given by (fT2] | is zero under hypothesis Ho, as required 
in III-BI The constant kt enforces that Var{i(r)|?io} = 1: 

2 \ -1/2 



kt 



1 



p{r\no) 



dp{r\n,) 



89 



dr 



(16) 



9=0 



Finally, (fTOl i and ( fTSI l give the efficiency per element for such tests: 



r] = n ^fcj ^ 



(17) 
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B. Best embedding function for a given detection function 

The detection function t being given (such that t(ro) is a centered random variable with unit variance), we write: 

(18) 



^ER{i(r)|7^i} 



Es<! ^i(s + ^?w(s)) 



Es{w(s)^ Vi(s)}. (19) 



It appears that, for a given t, it is important to let w(s) cx V<(s), Vs e M". The efficiency per element is then 
upper bounded by the following Cauchy-Schwarz inequality: 



with equality when: 



-([ ps(s)||w(s)||||Vt(s)||ds) </ ps(s)|lVi(s)||2ds (20) 



w(s) = k^Vt{s) Vs e M", (21) 
where fc^ is a normalizing constant to achieve Es{||w(s)|p} = n: 



K = v/^/Es{||Vi(s)||2}. (22) 
(|20] | and (l22T i give the efficiency per element for such tests: 

?7 = rifc;^' = Es{||Vt(s)||2}. (23) 

C. Synthesis 

For the moment, we know how to design the best embedding function for a given detection function, and how to 
design the best detection function for a given embedding function. This is reminiscent of the Lloyd-Max algorithm in 
quantization. However, dealing with closed form equations, we can insert dSTT i in (fT2T l yielding a partial differential 
equation, that we loosely name 'fundamental equation of zero-bit watermarking': 

ps(r)t(r) + fctfc^div(ps(r)Vt(r)) =0 Vr e M". (24) 

Hence, the best couple of detection/embedding functions {i,w} is {t* ^k^Vt*}, with t* a fundamental solution, 
ie. a solution of (l24l i. Note that ([TtT i and (l23T l are still valid. Therefore, it is possible to build a scheme of a given 
(virtually, as high as possible), provided (l24l l admits a solution with k^kt = rj^^. The fundamental equation can 
also be written as: 

r7<(r) + XP^vt(r) + Vh{r) = 0, (25) 
Ps(r) 

V'^t{r) being the Laplacian of t{r). 

D. A geometric property of fundamental solutions 

A nice property induced by the fundamental equation is that a pair of its solutions with different efficiencies per 
element are orthonormal for the scalar product (., .) defined here for two functions g and h by: 

(<?,/i) =ER{5(r)/i(r)|Ho}. (26) 
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Denote L[t] — div(ps(i")Vt(r)). This differential operator is symmetric if ti{r)L[tj]{r)dr — f^^ L[ti]{r)tj{r)dr. 
In our case. 



U{r)L[tj]{r)dr - tj{r)L[U]{r)dr = div(ps(r)(t,(r)Vt,(r) - tj(r)Vi,(r)))dr. (27) 
The symmetry is enabled for functions ti,tj if the last term, denoted by C, is zero. Let us write it as a limit: 

C ^ [ div(ps(r)(t.(r)Vi,(r)-<,(r)Vi,(r)))o;r (28) 
= lim / div(ps(r)(i»(r)Vt,(r) -t,(r)Vt,(r)))dr (29) 

= lim / ps(r)(i,(r)Vt,(r)^e(r)-t,(r)Vt,(r)^e(r))dr (30) 

The Gauss theorem gives the later equation. Assuming that the pdf of the host vanishes more quickly than the norm 
\\ti{r)Wtj{r)\\, we suppose in the sequel that the symmetry property is enabled for the solutions of the fundamental 
equation. Then, (l24l i in ( |27T i gives 

/ U{T)L[tj]{T)dr- ( t^{v)L[U]{v)dr = - [ t,{r).ijjPs{r)t^{r)dr+ [ tj{r).ij,ps{r)U{r)dr (31) 

JR" JR" JR" JR" 

= {v^-VJ){U,tJ) =0 (32) 

The restriction to normalized detection functions and this last equation imply that {ti,tj) = 5{j — i) where S is the 
Kronecker delta function. Hence, the solutions of the fundamental equation with different efficiencies per element 
constitute a family of orthonormal functions (Subsection llV-B. fl even shows orthonormal functions sharing the same 
efficiency), if the symmetry property holds for all pairs of elements of this family. 

IV. Some solutions of the fundamental equation of zero-bit watermarking 

We are not able to find a general solution of the fundamental equation. However, in some cases, we show some 
examples of solution in this section. 

A. The scalar case 

To avoid multiplication of notation, we use the same letter to denote the scalar version of above-mentioned 
vectorial functions. 

We suppose here that the host samples are i.i.d. such that ps(s) = YYi=iPs{si)- Moreover, our strategy is to 
maintain this statistical independence while embedding the watermark: w(s) = (eiw(si), • • • ,e„i(;(s„))"^, where 
e is a secret vector, with for instance, = ±1 Vi G {1, • • • (fTTT ) shows that the detection function is indeed 
a sum t{r) = ^it{fi)'^ ™d (|25l l boils down to a scalar second-order ordinary differential equation with non 

constant coefficients: 

r)t[r) + ^^t'{r)+t"{r)^Q. (33) 
Ps[r) 
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TABLE I 



Polynomial SOLUTIONS of the scalar Gaussian case s ~ A/'(0, 1). 



V 


w{s) 


i(r) 


Var{t(r)|'Hi} 


1 

2 
3 
4 
5 
6 
7 


1 
s 


r 

-l + r^ 
-3r+r^ 


1 

1 + 666*2 +o(e'') 
1 + 12^661 + 6086*2 ^ o(g3) 
1 + 547002 + O(04) 
1 + 40^3061 + 4912202 + O{0^) 
1 + 44139202 + O(0*) 


^ 3 


x/6 


%/6 


2^/6 
ISr-lOr^+r^ 


2v^ 
15s-10s^+s^ 


2/30 

-15+45r*-15r'*+r'= 


2^30 
-15+45s^-15s'*+a'' 


12^5 

-105r + 105r^-21r^+r'' 


12\/5 


12\/35 



ij Gaussian case: Assume that s ~ Af{0, a^). ( IZSl l becomes even simpler: ?7i(r) — rt'{r)/a'^ + t"(r) = 0. The 
solution is a linear combination of two 'independent' (ie. their Wronskian is not null) confluent hypergeometric 
functions of the first kind taken in 

- '''' 

If (7^77 is an even integer, t*^"^ is a polynomial function. If a^yy is an odd integer, t'^''^ is a polynomial function. 
Another way to see this is to recognize this later differential equation as the Hermite equation when 77 is a positive 
integer and cr^ — 1. Therefore, if ?7cr^ = k E N, tk{r) = HkHk{r / ax), Hk being the Hermite polynomial of order 
k. This family of polynomials is known to be orthogonal with a weighting functiorQ cxp(— In our context, 
this is confirmed by dSOl l, which reduces to the value of the integrand on the boundaries on an increasing interval 
of M. The condition C = is satisfied because lim^^oo exp(— r^/2CT^) = 0, Vm G N. In the sequel, we call 
this set of fundamental solutions the 'polynomial family'. 

Table IIV-A.II gives the expressions of the first elements of this family and their associated embedding function. 
Figure ([T]l shows a plot of the detection function of these first elements. 

The first line of this table is the well known direct spread spectrum scheme with a linear correlator, optimal 
detector in the Gaussian i.i.d. case. The second line is known as the proportional or multiplicative embedding, first 
proposed in [23, Sect. 4.2] for perceptual reasons (ie., it is known that a greater embedding power is not visible when 
watermarking wavelet coefficients with a proportional embedding, in comparison to a simple additive embedding). 
A higher efficiency per element is another inherent advantage of proportional embedding. The remaining lines of 
this table generalize this idea to new schemes (as far as the author knows). 

2) Uniform case: The classical 'flat-host' assumption used in DC-DM scheme studies states that the host 
pdf is a piecewise constant function. More precisely, we assume here the host pdf can be written as ps{s) ~ 

'This is the probabilists' definition of Hennite polynomials. However, these polynomials take different forms according to the chosen 
standardization. For instance, kj. = 1 / when the coefficient of highest order of j. is set to 1 . 
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J2t^oc Pi^ii^)^ y/iih Hi the indicator function of the elementary interval [-^i, :^(* + 1))' J2t^oc ^ 
y/rj/n. In this case, dZSl l defined almost everywherj^ is a lot simpler: rit{r) +t"{r) = 0, whose obvious solutioj^ 
is t{r) — cos{y/rir) and hence, w{s) — — V2sin(Y^s). Although these are not exactly the sawtooth embedding 
function of the scalar DC-DM (a.k.a. SCS), we find back at least periodic functions. 

If the 'flat-host' assumption holds on the above partition of M, then it also holds on the finer partition Uit°^oo[fe^*' k 
1)), fc e N. This gives birth to another fundamental solution tk{r) — \/2cos{k y/rjr), whose efficiency per element 
is k'^ greater. We call the sinusoidal family the set of fundamental solutions tk indexed with integers. Once again, 
elements of this family are orthonormal: 

{tuM = / " cos{k^r) cos{e^r)dr = 5{k - £). (36) 

i •''^ 

B. The vector case 

lIV-Al uses the cartesian system where the embedding processes in a sample wise manner. We generalize this idea 
to block based watermarking schemes assuming there exists an integer p dividing n so that R" = x IRp • • • x 
and that ps(s) = Y\^!jiP{s(i-i)p+ii • ' ' j •S(i-i)p+p)- If ^'^^ is a solution of the fundamental equation in MP with a 
given efficiency, then i'^")(r) = \/p/n X]r=i ^^^^ ; ' ' ' :'^(i-i)p+p) is a solution in R" yielding the same 

efficiency. This realizes a statistically independent embedding in the sense that the block of p watermark samples 

-Except on the boundaries due to discontinuities. This has little importance as the probability that the host signal is on a boundary is zero. 
'The other solution {t{r) = ^2 sin(^r), to(s) = v^cos(y^s)} is valid on a shifted partition IJJ27r(^* ^ 2^(2* + -'^))- 
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only depends on the same block of p host samples. The issue is now on finding solutions t'^P\ A usual technique 
is the separation of variables method in a specific orthogonal coordinate system [24]. 

1) Separation of variables: Classically, the separation of variables method considers a solution ^(^''(r) = 
nr=i ^iji (^i)' where each tr,^ have to satisfy ( |33] | with their own efficiency rji. The resulting efficiency of t'^^^ 
is then r] = X]r=i ^i- P^i" white Gaussian hosts, this gives birth to an extension of the polynomial family which is 
indeed based on the multivariate Hermite polynomials, indexed by the n/p-uple k e N'': i?k(r) — Y\^=lHk^{ri). 
Two different elements of this family are orthogonal for the scalar product (|26] |. even if they share the same 
efficiency per element. 

This extension of the polynomial family is illustrated in the following example. If S ^ A/^(0, (T^I„), then 
Vps(r) ~ —ps{r)r/al, and (l24l l becomes r]t{r) — r^Vt{v)/a'^ + V^t(r) = 0. JANIS, a zero-bit watermarking 
scheme heuristically invented some years ago [5], [17], is a fundamental solution. Its detection function is the 
following one: 

" %=\ j=i ^ 

Note that rj appears only once in the detection function, Vj G {1, • • • , n}. It is easy to see that r"'"Vt(r) = pt{r) 
and V^i(r) = 0. Thus, JANIS with order p is a solution to ( |24] | provided that yycr^ — p. This can be interpreted as 
follows: this is a block based watermarking scheme built on the p-multivariate Hermite polynomial H^x ... ij. This 
theoretical framework proves the optimality of the heuristic JANIS scheme. 

Separation of variables can be done on another coordinate system. The following spherical coordinate system 
(p, 9\, - ■ ■ , Op-i) is adapted to isotropic host distributions, ie. ps{s) ~ f{p) with p — ||s||: 

ri = p sin 6p-i sin 6'p_2 • • • sin 62 sin 61 
f2 — psin0p_i sin0p_2 ■ • • sin^i COS01 
r3 = psin0p_i sin0p_2 ■ • • COS02 



rp_i = psint^p-i cosyp_2 
r,p ~ p cos 6'p_ 1 . 

For instance, we seek a function t(r) — t{p,9p-i) = U{p)V{6p-i), which depends on two simple statistics 
p — X^iLi ''"i ™d ^p-i — arccos(r-'"ep/||r||). is a secret unit vector shared by the embedder and the detector 
taken as the p-th element of the canonical basis (ie. in the cartesian coordinate system). Separating variables in 
( IZSl l yields two equations: 

KV{e) + {p-2)cot0v'{e) + v"{e) = o (38) 

{r^p^-K)U{p)+(^{p-l)p + p'^^U'{p)+p^U"ip) = (39) 



February 1, 2008 



DRAFT 



12 



with K e R. The choice U{p) = hp^ and V{e) — pcos^ — 1 is a solution provided f (p) / f{p) = —p/a^. (white 
Gaussian host), K = 2p and rycr^ = 2. The detection function is then 

t{v) = h ((V^r^e,)^ - llrf) = ^ ( {p - iVl -Y^A . (40) 

t{v) = T defines a p-dimensional two-sheet hyperboloid. This is closed to a two-sheet hypercone, acceptance region 
of the absolute normalized correlation, which is the optimum detection function based on such simple statistics 
for Gaussian white host [3]. We agree here with N. Merhav and E. Sabbag that the acceptance region must be a 
two-sheet geometric form contrary to the well-known normalized correlation and its one-sheet hypercone [1]. Yet, 
neither the absolute normalized correlation nor the famous normalized correlation are fundamental solutions. We 
suppose that this stems from the difference in the models of the perceptual constraint: fixed embedding power vs. 
random small and positive gain. Eq.(l40ll is however not unknown in the watermarking hterature. This is the measure 
of robustness given in Cox et al. book [1, Eq.(5.13)]. 
Let us now invent a host such that 

R/Rq , if i? < i?o 
1 , if i? > i?o- 

This extension of the one dimension uniform distribution (in the sense that, in one dimension, a uniform distribution 
gives a linear cumulative distribution function over the interval Bi{R)) implies that its isotropic pdf equals f{p) = 
p^~P /Rq, if < (0 < i?o (0, else). A solution in the form t(r) = U{p) must then satisfy riU{p) + U"{p) = 0, 
whose solutions are as follows: 



P(s e Bp{R)) 



t^^^p) = ^2surf(5p(l))cos(07p) with V^i?o = [tt] , (41) 

t'^''Hp) ^ y^2surf(5p(l))sin(y^p) with 07i?o = [27r] . (42) 

surf(5p(l)) is the surface area of the p-hypersphere of unit radius: surf(5p(l)) — 2ttp/^ /T{p/2). This solution 
looks like the sphere hardening dither modulation scheme invented by F. Balado [25, Sect. 5]. 

2) Sparsity: Many possible coordinate systems allow a separation of variables [24], but their investigation is 
out of the scope of this paper. Preferably, we would Uke here to rediscover a famous principle in watermarking. 
Suppose we know a solution t* to the scalar equation: ri*t*{x) + f{x)t*'{x) +t*"{x) — 0. We would like to extend 
this solution considering a solution in the form: t = t* o g, with 5 : K^* M a differentiable function. Gradient and 
Laplacian have the following expressions: 

Vt(r) = r'(.9(r))Vg(r), \/H{v) = t*" {g{v))\\\/ g{v)f + t*' {g{v))\/^ g{v) . (43) 

and the fundamental equation becomes: 

i"(3(r)) f-^/(5(r)) + ^^^^Vg{v) + V^g{v)] + t*" {g{v)) (\\Vg{v)f - ^) = ^ (44) 
V '7 Ps(r) J V V J 
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A linear form, ie. a projection ^(r) = r^A, is a solution providing the following simplifications: V^.g(r) = and 
||Vg(r)|| — \\\\\ . Then, i is a fundamental solution with an efficiency per element 77 — 77*||A||^, provided we have: 

^^A.||A|P/(r-A). (45) 

For a white Gaussian host, this implies that f{x) = — x|| A||^^(t^^, which is the score (ie. p'{x)/p{x)) associated 
to JV{0, IIAjpcr^). Hence, the polynomial family is extended to the vector case with fundamental solutions of the 
form tk{r) = KkHk{r'^ ^/\\^\\(Jx) whose efficiency per element is r/ = k/a"^. 

For the flat host assumption, / appears to be the null function. Hence, the sinusoidal family is extended to the 
vector case with fundamental solution of the form t{r) = kt cos(r"^A) whose efficacy is ry = ||A||^. 

This kind of solutions illustrates the principle known as sparsity or time sharing [26, Sect. 5.2 and 8.2], where 
the watermark embedding is processed on the projection r^A. A typical implementation of this principle is the 
Spread Transform Dither Modulation [26, Sect. 5.2]. 

3) Space partitioning: Under the flat host assumption, ( l25T l reduces to the well known Helmholtz equation: 
r]t{Y) + V^t(r) = 0. Suppose t* is a solution, then the composition of this function by a translation operator 
yields another solution: to(r) = t*(r — Tq). This property is due to the fact the score Vps(r)/ps(r) is invariant 
by translation since it is null. One can also mix different solutions defined over a specific region C,; C M^: 
<(r) = X^i ^i('^)ni(r), with !!,;(.) the indicator function of region Ci. Assume now, that regions {Ci} constitute a 
partition of W and that the host pdf is a piecewise constant function such that Ps(s) = Si^«ni(s). Then, the 
above mixture is a solution of the fundamental equation, except on the boundaries of contiguous regions where the 
gradients of ps and t are a priori not defined. 

An elegant way to set a partition is to define the regions as the Voronoi cells of a p-dimension lattice A: 
Ci = V + Ci, Ci E A and V the Voronoi cell centered on 0. With all these elements, we can write: 

t(r)=^t,(r)n,(r)= ^ ^ (r - c,)n,(r) = ^ (r - g(r)), (46) 

with Q{.) the quantization function mapping onto A. 

Under the flat host assumption, sparsity and space partitioning indeed give the same extension of the sinusoidal 
family: tk(r) = \/2 cos(r-'^Ak), when vector Ak is defined by 27rG~-'^k, with G the generator matrix of lattice A 
and k e N^. r belonging to Ci, means that r = + f = Gui + f, with £ W and f e V. Thus, i(r) = t(f ) 
because nf k e Z, V(k, rii) e x Z^. This gives 77 = || Ak|p = 47r^||G^-^k|p. Once again, this is not exactly the 
lattice quantizer based watermarking scheme, but at least we find back solutions which are periodic with respect 
to a lattice. 

To conclude, the goal of this section is to show that several well-known watermarking schemes are indeed 
solutions of the fundamental equation, underlying the unifying character of this theoretical framework. 
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V. Conditions, limitations, and extensions 

A. Conditions 

Many assumptions have been made to derive the fundamental equation and we would like to collect and state 
them explicitly in this section before providing some limitations and extensions. 

First, at the embedding side, the model of the perceptual constraint is based on the masking phenomenon, modeled 
as a perceptual gain 9. Whereas this article focuses on a scalar gain for sake of simplicity, in practice, it is likely 
to be a vector of positive and small values locally adapting the power of the watermark signal to the power of the 
masking effect. The main fact is that this gain is unknown when generating the energy constrained signal w(s), and 
unknown at the detection side. This model is quite different than the classical power or energy constraint, which 
imposes a fixed amount of embedding distortion. 

Second, in this paper, schemes are claimed optimal if they maximize the efficiency per sample. This meaning 
of optimality only holds when the Pitman Noether theorem can be applied, ie. for schemes fulfilling the following 
regularity assumptions [11, Sect. III.C.3]: 

• The energy of the watermark signal and the variance of the tested statistic must be bounded. Without of loss 
of generality, we impose _Bs{|lw(s)||^} n and E^{t{r)^} = 1. 

• The smoothness conditions on the density p{.\Hi) as a function of 9 and on the non-linearity t{.) such that 
Eq. ^ holds, 

• The convergence in law of the statistic t(R) to a normal variable under both hypothesis. 

Moreover, we also restrict our study to detection functions defined in K" at least twice differentiable except on a 
zero-measure set to get the existence of its gradient and Laplacian. Then, the above study can be summarized in 
the following proposition. 

Proposition 1: Suppose a zero-bit watermarking scheme based on the embedding and detection functions {w(.), t(.)} 
satisfies the above-mentioned conditions. Then, this scheme is optimal for a given efficacy 77 and when there is no 
attack, if and only if t{.) is a solution of the fundamental equation ( |24] | and w{s) = fctuVt(s), Vs e M". 

The convergence in law to a normal variable is a very restrictive condition. When the host samples are i.i.d. (or 
blocked based i.i.d.), a block based embedding gives an elegant solution because its matched detection function is 
the sum of n/p i.i.d. random variables. The parameter p must be fixed to ensure the asymptotic normality by the 
central Umit theorem (as E{t'^P\r)'^} < +00). 

Proposition 2: The principle of block based embedding gives birth to two important families of detection 
functions: sums of p-multivariate Hermite polynomials for white Gaussian hosts, and sums of cosine functions 
periodically defined on p-dimension lattices for flat hosts. Both families gather orthonormal functions for the scalar 
product defined by ( |26] |. 

B. Limitations 

The Pitman Noether theorem states that the efficacy is a criterion for optimality only asymptotically. This makes 
sense in our study because the watermark signal is deeply embedded in the host, thus requiring spreading of the 
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mark on long sequences. In the same way, efficacy is very useful in applications such as passive sonar and radio 
astronomy, also dealing with weak signals and long integration times. 

Our framework nicely gives a unified theory gathering many known watermarking schemes. However, all new 
fundamental solutions may not be adequate for practical implementations where host signals are not so long, or 
is not so small. We foresee at least two reasons: 

• When is not so small, the variance under Tii grows very fast with the efficacy, as shown in Appendix 
and in Table II V- A. II 

• The Berry-Esseen theorem shows that the rate of convergence to the normal distribution depends on the third 
moment of t{.), which we suspect to be fast increasing with the efficacy. 

A proper study requires a non asymptotic analysis of the performances which is out of the scope of this article. Some 
experimental works can be found in literature. For instance, the p-multivariate Hermite polynomial based family 
of detection functions has been already experimentally tested under the abbreviation JANIS: in [17], the efficacy is 
given by the order of the JANIS scheme, ie. f] — p. The ROC curve (ie. Pp — Pp{Pfa) for a given embedding gain) 
and the 'power' curve (ie. Pp ~ Pp{d) for a given P/a) are largely improved compared to performances of spread 
spectrum watermarking scheme (see respectively Fig. 3 and Fig. 4 in [17]). However, for a given vector length, 
the comparison of the performances based on a normal distribution of the tested statistic with the experimental 
measurements clearly mismatch as the efficacy increases and as the parameter 6 increases. Hence, whereas the 
central limit theorem proves the asymptotic convergence in law needed in the theoretical framework, in any case, 
it shall not be used to estimate performances in practice. Another lesson learnt from [17], is that a scheme with 
a higher efficacy can perform more poorly than another one in an non asymptotic regime. In Fig. 3 of [17], the 
scheme with p = 5 yields a higher power than the one with p = 4 only if Pfa > 10^"^, with n = 2400 for both 
schemes. 

Whereas this study provides a somewhat elegant, constructive and unifying theoretical framework; unfortunately 
it doesn't give clear guidelines on the design of a watermarking scheme in an non asymptotic regime. 

C. Extension to asymmetric tests 

So far, the main idea of the paper is to take advantage of the knowledge of the host value s to boost the efficiency 
per element. This results in the increase of ER{i(r)|7ii} = 9y/nrj + 0{9'^), while the variance Var{t(r)|7ii} is 
maintained at the level of Var{t(r)|7io} at least to the first order Asymptotically, the test has to make a clear cut 
between two distributions having the same variance. This is sometimes called a symmetric test. This subsection 
focuses on the variance Var(t(r)|7ii). As H. Malvar and D. Florencio did for zero-rate watermarking [27], we 
would like to control the value of Var(f;(r)|?ii), achieving so-called asymmetric testj^ 

The watermark signal is already dependent to the host through the vector w(s) which pushes the host towards a 
region in space where the detection function has a higher value, ie. hopefully the acceptance region. We add here 

^Be careful not to confuse with asymmetric watermarking where the detection key is different from the embedding private key. 
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another dependence which modulates the ampUtude of this vector: host signals which are naturally far away from the 
acceptance region are more strongly pushed than those near the acceptance region. We write the watermark signal 
x(s) = 0fc„,(s)w(s). For a fair comparison with the previous sections, the constraint reads: Es{A:u,(s)'^||w(s)|p} = 
n. The embedding strategy is not changed: w(s) = Vi(s). Hence, we have: 



^ER{i(r)|Hi} 



n =. Es{fc„(s)2||Vt(s)||2} (47) 
= Es{fc„(s)||Vt(s)f} (48) 

9=0 

Es{fc^(s)||Vt(s)fp 
^ Es{fc„(s)2||Vt(s)||2} ^ ^' 



Now, the goal is to choose function kw such that it reduces the variance under Hi 

d 



= 2Es{t(s)i>(s)} > -2Var{i/(s)}, (50) 

61=0 



where vis) = fctu(s)|| Vi(s)|| such that its centered version is v{s) — v{s) — -^'Eiii{t{v)\Hi}\g_^. The Cauchy- 
Schwarz inequality gives — 2Var{zy(s)} as the lower bound, with equality when v{s) = — c<(s), c a positive constant. 
Hence, we achieve to reduce Var(t(r)|7ii). However, this strategy consumes embedding distortion: 

n = Es{fc„(s)2||Vt(s)||2}=Es{i^(s)2||Vt(s)|r2} 

= c2Es{i(s)2||Vi(s)||-2} + nryEs{||Vt(s)|r2} _ 2c^T^s{t{s)\\Vt{s)\\-''} . (51) 

For the simple cases explored in this paper, we are able to find a bijection s' — h{s) such that }3s(s')^(s')ll^i(s')ll ~^ = 
-ps(s)t(s)||Vi(s)||"2^ which implies a third null term. Denote a = Es{i(s)2|| Vt(s)||-2} and b = Es{|| Vt(s)||"2j.^ 
(ISTT i finally reads: 

n = a(? + bnrj. (52) 

A higher c decreases Var{t(r)|7ii} (first order approximation) but also -q due to the distortion constraint. In 
practice, this strategy brings a crucial issue. Starting from a tested statistic having a symmetric distribution under 
both hypotheses, a decrease of Var{t(r)|?ii} yields a higher power of test only if ER{f(r)|7ii} is greater than 
threshold r > 0. Now, if this is not the case (for instance, due to an attack), then the impact of this strategy is just 
the opposite. This phenomenon does not appear in [27], as this article tackles watermark decoding where threshold 
T equals 0, the distributions under Tip (bit 1 has been hidden) and Tii (bit has been hidden) being symmetric 
around this value. 

Experimental works about this variance reducing embedding strategy applied to the JANIS scheme are summarized 
in [5, Sect. 6.4]. It stresses the difficulty in finding an appropriate value of c because it requires to foresee an attack 
scenario and its impact on the expectation of the tested statistic. The final rule applied in this experimental paper 
is to set c to the value which maximizes the Gaussian estimation of the power of test (which is, once again, a very 
poor estimation). Results are mitigated and more complex embedding strategies are investigated in [5, Sect. 6.4]. 
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VI. Attack noise 

When there is an attack, the received signal under Hi is ri = a(y)- The attack channel a is defined through a con- 
ditional probability distribution pa(i"i|y), whose associated attack power is cr^ = J J \\ri—y\\'^pa{ri\y)pY{y)dydri/n. 
The parameters of the attack channel are unknown at the detection side. We would like to keep the detection as 
simple as possible so that the estimation of these parameters is not tractable in this strategy. The performance of the 
detector should degrade slowly with the strength of the attack, according to the definition of robust watermarking 
given in [28]. 

The Pitman Noether might then become useless because there is a disruption between the two hypotheses: Hi 
doesn't asymptotically converge to Ha, in the sense that the regularity conditions (|5]l are violated due to the presence 
of the attack channel only under Hi- 

We present here two ways to tackle this problem, changing our framework in order to enforce the Pitman Noether 
theorem. A first idea is to restrict our analysis to a fixed WNR (watermark to noise power ratio): 9f^/a'^ — g. The 
received signal can be written as: ri = s + ^?„w(s) + dng^^^^i, with Ez{||z||^} = n. Therefore, the power of 
the difference signal ri — ro asymptotically vanishes with 9f^. The second idea considers attacks with fixed DNR 
(document -ie. host- to noise power ratio) where signals are corrupted by the same attack under both hypotheses 
as T. Liu and P. Moulin did [6]. Yet, the targeted applications as described in our introduction do not a priori 
motivate this possibility because the attack of unprotected contents under Ho are clearly unlikely. We argue that a 
'soft' attack on original pieces of content still produces regular content. The attack channel changes the value of 
the feature vectors, but it does not modify their inherent statistical structure. 

Under both attack models, the fundamental equation appears to be statistically robust in the sense that it is not 
modified by the presence of the attack channel. However, this is only true for very particular conditions as described 
in the sequel. 

A. Fixed WNR attacks 

This subsection only shows that the fundamental equation remains unchanged when the watermarked signals 
goes through a fixed WNR AWGN attack channel. 

1) Best embedding function for a given detection function: As usual, we write: 

d 



g-En{t{v)\Hi} 



9=0 



ps(s)pz(z)dsdz (53) 



0=0 



= y"w(s)^Vt(s)ps(s)ds + y" j ^i^\It{s)ps{s)p^{i)dsdi (54) 

We assume z is independent of s and centered, so that the second term is null. We find back the same best embedder 
as (EB- 

2} Best detection function for a given embedding function: The pdf of ri = y + ^JgOi is given by the following 
convolution: 

PRi (r) = y PY (u)p^ez (r ~ u)du, (55) 
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whose derivative is composed of two terms: 



i^'^ez(i"-u) du (56) 

We assume that z is normal distributed. Then, lime^oPyg£/z('^ ^ ^) '■^^ Dirac distribution. Hence, the first 
term is, as detailed in Sect. IIII-Al d / d6p^{Y)\g^^ = — div(ps(i")w(r)). 

The second term is calculated being inspired by some proofs of the De Bruijn's identity (see [29, Th. 16.6.2]). 
It corresponds to the derivative of the pdf of a(s) = s + ^Oz with respect to 9. In one hand, we have: 

^Pa(S)(r)=yps(u) (^-^ 
On the other hand, it appears that: 



n 

77 }P 



(r — u)(iu. 



V^Pa(s)(r) = / Ps(u) 



u 



2Qi 



1 d 



;;^]fv^.z(r-u)du=--p.(s)(r) 



Finally, the second term is null, because 

d_ 

'We 



Pa(s)(r) 



0=0 



= limg0VV(s)(r)-O, 

u — >o 



and we find back the same best detection function as (fT2l) . 



(57) 



(58) 



(59) 



B. Fixed DNR attacks 

The framework is changed so that the hypotheses are now: Ho : ro ~ a(s) against Hi : ri = a(s + 0w(s)). 
What are the impacts of this new framework on the detection and embedding functions? 

As already said, our analysis only holds for channel attacks conserving the statistical structure of the host signal. 
The restrictions are as follows. For host s ^ 7V^(0, I„), the attack is an SAWGN channel: a(s) = 7(3 + z), with 



z ^ M{0, c^In) independent of s and 7 = l/^/T+fjf. The attack is a Wiener filtering for this very simple case, 
which maintains p{r\Ho) as a normal distribution. For the flat host assumption, the attack is an addition of an 
independent noise: a(s) = s + z. The new expression of p{r\Ho) is given by a convolution, which renders the pdf 
under Ho even flatter and larger Consequently, at the scale of the watermarking signal, p{r\Ho) is still a piecewise 
constant function. The expression (fTTI ) of the best detection function given the embedding function is not modified 
when restricting to attack channels preserving p{r\Ho)- 

This is not the case for the best embedding function given the detection function. For the class of attack channel 



considered in this paper, we can write a(s) = 7(s + z) with 7 = 1 for the additive noise attack, and 7 — 1/ ^/T+a^ 
for the SAWGN attack. ( fTsT i is then modified as follows: 



d_ 

de 



En{t{r)\Hi} 



e=o 



d 



i(7(s + e'w(s) +z)) 



Pz(z)Ps(s)dzds 



0=0 



89 

7w(s)'^ Vt(7(s + z))pz(z)rfz^ Ps{s)ds. 
This last equation shows that the best strategy at the embedding side should set 

w(s) cx Ez{Vi(7(s + z))}. 



(60) 
(61) 

(62) 
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This implies that the embedder knows the attack channel parameters. This counter attack may not be realistic in 
general, and we keep our former strategy given by (|2T]) . so that 

77(7, a.) = 2:!!%^(Es{w(s)^Ez{w(7(s + z))}})2. (63) 

However, there are some cases where the counter attack (|62| | is surprisingly simple because it is indeed identical 
to the regular embedding strategy (l2Tl l whatever the parameters of the attack channel. This occurs when t is 
such that E2{Vt(7(s + z))} = h{'j,az)'^t{s). As a consequence, the fundamental equation (IZST i derived in the 
no attack case, remains valid under these particular attack cases. The efficiency per element is then equal to 

For the polynomial family, we rewrite the Wiener filtering denoting z = (t~^z distributed as A/'(0, 1) and 
a = arccos(7). A less familiar identity of the Hermite polynomials allows to write: 

t'ii^is + z)) ^ KdHi_i{cos{a)s + sm{a)S) ^ Ki£Y.{ir^) cos\a)sm'-^-\a)^^^ (64) 

fe=0 

Ez{<£(7(s + z))} reduces to E^{4(7s + cr^jz))} = Ki£Y'-^Hi^i{s) = j'^~'^t'g{s) because E^{i7fc(z)} = 6{k). 
Consequently, we can state the following proposition: 

Proposition 3: The polynomial family is a set of fundamental solutions for i.i.d. Gaussian hosts and SAWGN 
attacks with Wiener filtering, whose efficiency per element is given by 77(7, cr^) = £j'^^. Wiener filtering means that 

7 = (l+a2)-V2. 

Two noticeable exemptions are ti and t2, whose efficiency follows the same rule whatever the value of 7 in the 
SAWGN channel. Last but not least: the higher the 'original' efficiency ?7(1,0) — £, the less robust is the scheme 
in the sense that 7/(7, (T^)/r/(l, 0) = (1 +(t|)^'''^^°) decreases faster with the strength of the attack. 
For the sinusoidal family, an additive noise leads to 

Ez{4(s + z)} ^ t'f {s)Ez{cos{ey/^z)} - £y%cos{£y/Tjs)Ez{am{£y/^z)}. (65) 

The desired property is enable whenever the attack noise has an even pdf which sets the second term to zero. For 
instance, the AWGN channel gives 'Ezltgis + z)} = t'^{s)e~^'^'^'^^. Consequently, we can state the following 
proposition: 

Proposition 4: The sinusoidal family is a set of fundamental solutions for flat hosts and additive symmetric noise 
attacks. For the AWGN channel attack, its efficiency is given by 77(1, a^) — £^/T]e~^^'^^- . 

Once again, the higher the 'original' efficiency 77(1, 0), the less robust is the scheme in the sense that 77(7, az)/ri{l, 0) = 
g-'7(i>o)o-^ decreases faster with the strength of the attack. 

The same analysis also holds for the extension of the polynomial and sinusoidal family to the vector case. For 
instance, JANIS is a solution of the fundamental equation for i.i.d. hosts and SAWGN attack, such that F,z{S7t{j{s+ 
z))} — 7^'^^Vi(s). The Wiener filtering restriction is not necessary as JANIS is based on first order Hermite 
polynomials. This gives the following efficiency per element 77(7, az) — pj'^^ which follows the same decreasing rule 
as the scalar polynomial family. The extended sinusoidal family foUows the same rule: r]{l, az)/ 7/(1, 0) — 6^'''^'°-'°^^ 
with 77(1,0) = 47r^||G'^'^k||^ as shown in Appendix Hill 
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VII. About DC-DM watermarking based on lattice quantization 

Our theoretical framework doesn't succeed in finding back well known DC-DM watermarking schemes based on 
lattice quantization, where the detection function is usually defined by an EucUdean distance t{r) — kt\\Q{r) — r|p, 
and the embedding function x(s) = a{Q{s) — s) complies with rule i2T[ . Parameter a is fixed and it plays a 
crucial role in the trade-off between the embedding distortion and the inherent robustness of the scheme. Note 
that our point of view is very different as we suppose that the host signal is pushed in a direction given by 
w(s) = fctV<(s) = 2fct(Q(s) — s), but the watermark signal x(s) = 9w{s) is not deterministic because the 
amplitude 6 is not fixed. 

A. Efficiency without noise 

We consider a lattice A and a host whose pdf is a piecewise constant function over the partition induced by A: 
— Uc eA("'^ + '^')- study the detection function given by t{r) — kt{\\Q{r) — r|p — fi), with Q the quantizer 
associated to A, and {kt, /i} enforcing a centered unit variance tested statistic under Hq: 



vol(V)"^ / \\rfdr = /(A, 2 
Jv 



M = vol(V)-i / llrlpdr = /(A,2), (66) 



kt = - [voliV)-' Jjrfdr - j = -(/(A, 4) - /(A, 2)2)- 2 . (67) 
I{A,k) denotes the fc-th normalized moment of V, ie. vol(V)-^ /y ||r||'''(ir. The embedding function is w(s) = 



2kyjkt{Q{r) — r), ie. a vector pointing towards the nearest element of the lattice. Constant is given by: 



(68) 



(69) 



2kt^i{Ka) 

Finally, ( |23] | gives the following efficiency per element for the noiseless case: 

4/(A,2) 
''^ /(A, 4) -/(A, 2)2- 

For a positive scale factor (3 <1 giving a finer partition induced by /3A, we have a higher efficiency ?7^a — P '^Va- 
Therefore, lattices should be compared for partitions with vol(V) = 1. Anyway, finding the optimal lattice giving the 
best efficiency is out of the scope of this paper As an example, for cubic lattice A = Z^, V is the centered hypercube 
[— 1/2,1/2)P and 77 = 60. For the two dimension hexagonal lattice A2, whose associated generating matrix is 
G = [2 1; V3]/ such that vol(V) ^ 1, we achieve a higher efficiency per element r/ = 1800^3/43 w 72.50. 
Compared to the square lattice Z^, the 'more spherical' of the two lattices is the best, when no attack is considered. 
This is surprisingly different from the zero-rate case presented in [21, Sect. 3.3]. 

Increasing the integer p, there exist lattices with nearly spherical Voronoi cell. Assuming V = Bp{R), the 
efficiency reads 77 = {p + 4){p + 2)R^'^ . Setting R = V{p/2 + 1)^/^ such that vol(V) = 1, and using Stiriing's 
approximation, we achieve a linear efficiency per element: 77 k, 27rep. In view of Sect lV-B[ this issue is now whether 
we can increase parameter p, which is the size of the blocks. The tested statistic reads in term of the square norm of 
a quantization noise of a flat host, which is not asymptotically Gaussian. Once again, we are facing the limitations 
of the Pitman Noether theorem: the block based watermarking must be done with a fixed p. 
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B. Efficiency of a mixture of fundamental solutions 

This section uses the geometric property of IIII-Dl to calculate the efficiency per element of a detection function 
defined by a mixture of fundamental solutions. Suppose a family of orthonormal fundamental solutions {tj} with 
integer indices (this is easily generalized to indices in W), and create the following detection function t(r) = 
SjLi "^j^jli")- We have: 

n n 
ER{i(r)|Ho} = X^c^,ER{t,(r)|7^o} = 0, Var{t(r)|Ho} - ^ cof = 1. (70) 

The last equation gives a constraint on the weights {wj}. 

The reader must be aware of two facts. First, we have chosen here to mix some detection functions, but we 
could also do the mixture on the embedding functions. Second, this mixture is a priori not a fundamental solution. 
Given this mixture, we select the best embedding function w(s) = ku,Vt{s). However, it is a priori not true that 
the mixture is the best detection function knowing w(s). The mixture of detection functions implies a mixture of 
the associated embedding functions, w(s) = Y^=i n7jWj(s), but with different weights: 



Wj = k^ujj^rij{l,Q) and fc„ = t;-, (1, 0)) 

i=i 

gives the efficacy when there is no attack: 

n 

77(1,0) = ^c.|r;, (1,0). (71) 

(l63T l gives the following efficiency per element under attack, 

I \ ' v/^,(l,0)r;,(7,a,)) ' 

ri{l,(J.) = }_^ujjWj^rij{^,a,) = — , (72) 



if we suppose that Es{wj(s)Ez{wfe(7(s + z))}} = 5{j — k)n/j.y/rij{'j, (Jz)/rij{l, 0), ie. the functions stay 
orthogonal even under attack. This assumption considerably simplifies the expression of the efficiency. From Sect. 



IVI-BI we know this holds for the polynomial family (7 = 1/^/1 + 0^), and for the sinusoidal family (7 = 1), 
because Ez{wfc(7(s + z))} cx Wfc(s). 

It is quite difficult to compare mixtures of fundamental solutions and to derive the optimum weighting. Let us 



denote the score gM{{(^j},'J,<Jz) — \/ ?7(1, 0)77(7, a^) for a mixture with weights {ujj] and (7p(7;(l, 0), 7, tiz) the 
same score but for a pure fundamental solution whose efficiency is 7;(1,0) = X^jLi ^j'7j(li 0) when there is no 
noisjfl These two scores are equal when there is no noise, otherwise they have the following expressions: 

5M({t^i},7,crz) = ^w|r/j(l,0)7/ij(7,f7^) (73) 

gp(77(l,0),7,a,) = 7/(1,0)7/1(7,^.), (74) 

'Such fundamental solution might not exist for all weight distiibutions. For instance, the polynomial family requires that 0)cr^ £ N. 
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where function h is defined in IVI-BI If the embedder knows the parameters of the attack noise, then the optimum 
weighting is given by a simplex optimization: ujj ~ with j* — argmaxj 77^(1, 0)7/1^(7, Cz). Otherwise, we 

set the following criterion: GM({wj}) = /g^°^ gMiWj}, 7, crz)dazd'y. This represents the average performance 
of the mixture when no prior about the attack noise parameters is given. 

For the sinusoidal family, (|72] | holds if 7 = 1. The integration only made over az gives: 

Gm(K}) = yiE^lA(l'O) ^ y|y^?(i;0) = Gp(77(l,0)). (75) 

The inequality is due to the concavity of the square root function and it holds for any weight distribution. In the 
same way, for the polynomial family, (|72] | holds if 7 = (1 + cr^)^^/-^. The integration only made over 7 gives: 

Gm(K}) -E^,^ < -Jrwh = GpivihO)). (76) 

~{ ??i(l>0) + l 7?(1,0) + 1 

The inequality is due to the concavity of the function a; — > a;/(l + x) on [0, +00) and it holds for any weight 
distribution. 

This tends to show that a pure fundamental solution is on average more robust than any mixture of fundamental 
solutions. However, this is not a general proof. We have shown this only for the sinusoidal and the polynomial 
families when considering attacks such that (f72b holds and when /i (7,0-2) has a known expression. 

C. Application to DC-DM watermarking 

Mixture is a tool which renders the study of some watermarking schemes easier When applied on elements 
of the sinusoidal family, this allows to recreate whatever periodic detection function. For instance, the following 
weights bjj = — (— l)^3\/l0/7r^/j^ give the Fourier series decomposition of the SCS scheme: 

t{s) = -—l^—^cos{]^s) = — -{s-Q{s)Y-^ ill) 



w{s) = fc^^V^sin(jVi?.s) = -(s-Q(s))^ (78) 

with Q a quantizer whose step is A = 2tt/^. The application of (f72] i gives the efficiency of SCS under an AWGN 
attack, which is otherwise cumbersome to calculate with the direct expressions of t and w. Here, we simply have: 

where is the third Jacobi theta function. When there is no attack, r?scs(li 0) — 60/A^ — l^rj/n'^ w 1.527]. Fig. 
(|2]l shows the efficiency per element of SCS with cTj, ranging from to 1 for 77 = 1. It shows that the efficiency per 
element of a pure sinusoidal function starting from the same value, ie. rjscs{^,^)^ is largely more robust in this 

range of noise. However, when the variance of the noise increases, the asymptotic behavior of ( |79l ) is dominated 

2 

by the first term, j = 1, ie. e^^"'^, whereas the efficiency of the previous pure sinusoidal function has a stronger 
exponential decay: e"^'^^'''^^. In this asymptotic case, a pure sinusoidal function with efficacy 77 performs better 
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Fig. 2. Efficiency per element of tlie SCS scheme under AWGN attack against cr^. Tlie grey plots are the approximations by i79\ for 
jmax = {3, 5, 10, 20, 100}. The dotted line is the efficiency of the sinusoidal solution with r](l, 0) = 1, 52. 

In the same way, the detection function based on lattice quantizer of Sect. IVII-AI can be decomposed through a 
Fourier series over lattice A, whose generator matrix is G: 

t(r) = /(A, 2) + \/2 ^ WkCOs(27rr^G~^k), (80) 

with ojk = a/2vo1(V)~^ /y ||r|p cos(27rr-^G'^^k)(ir. This decomposition in Fourier series may not be easy to 
obtain except for low dimension lattices. Yet, whatever the resulting weight distribution, the mixture has for 
77(1, cr^), gM({ijJk}, 7, ), and GA/({ij-'k}) equivalent expressions as for the one dimensional case thanks to the 
common expression of the efficiency as shown in Appendix Hill Therefore, the main conclusion is still valid: under 
an AWGN attack, a pure sinusoidal solution sharing the same efficiency without noise, performs better on average. 

VIII. Conclusion 

Rewriting classical elements of detection theory with the assumption that the watermark signal depends on 
the host gives us the expression of the best embedding function knowing the detector. Coupling this result with 
the expression of the LMP test gives a partial differential equation we named 'fundamental equation' of zero-bit 
watermarking. Its main advantage is to offer a constructive theoretical framework unifying most of the watermarking 
schemes the community knows. Moreover, a side product is that the decomposition onto a family of orthogonal 
fundamental solutions provide an easier way to characterize the performance of DC-DM schemes. 
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Appendix I 

LMP TEST 

For a given embedding function w, we derive the Locally Most Powerful test, whose detection function is defined 

as: 

i(r) = . (81) 



ps(r) 86 

^ — > makes function f invertible: s = f~^(y), and p(r|Wi) = ps(f~^(r))| Jf-i (r, ^)|, with the last term being 
the Jacobian of f~^. Finally, the detection function is: 

^* {A{v)+B{v)). (83) 



Ps(r) 

Some simple equations are: 



f(s)|,=o = s> (84) 
f-'(y)L=o = y' (85) 
f"'(y) = y-ew(f-i(y)). (86) 



A. Expression of A{t) 

Deriving this last expression gives: 



(y) = -W(f-l(y)) - 0J^(f-l(y))__(y). (87) 



Hence, 



df- 



-1 



(y) 



-w(y). (88) 



86 

The elements of the Jacobian matrix are given by: 

[Jf-(y,^)](»,j) = ^=5{i~j)-6Vwi{f-\y)fJi-.{y)B^. (89) 
The simplification taking ^ = yields | Jf-i (y, 0)| = 1, and the expression of A is as foUows: 

A{v) = -Vps(r)^w(r). (90) 

B. Expression of B{r) 

This term implies the derivative of the determinant of matrix Jf-i (r, 6) which is invertible as ^ — > 0: 

^^(r,^) = \J,..(r,e)\tr{Jr-.{r,0)-'^{r,e)) (91) 

Taking ^ = gives: 
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The derivative of ( [89] l gives the elements of matrix ^ ^''g ^ (r, 9): 

fil jy— 1 o 

^g^(r,0) = -Vu;.(f-i(r)f Jf-i(r,0)e,-0-(V«;.(f-i(r)f Jf-i(r,0)e,). (93) 
So that, these elements are equal to — Vwi(r)-^ej when 6 = Q, and, finally, B{y) = — ps(r)div(w(r)). 

Appendix II 

MACLAURIN series of WAR{t{r)\Hi} WITHOUT ATTACK. 

We make the Maclaurin series of t{s + dw{s))'^, and take the expectation: 

Es{t{s + 9w{s)f} = 1 + 2eEs{wis)t'{s)t{s)} + 9^Es{wisfit' (s)^ + t{s)t"{s))} + 0{9^). (94) 

If t is an odd function, then t' and w = k^t' are even functions. The second term of the series is null. If t is an 
even function, the second term is not null as shown in Table II V- A. II 

A. First order term for even polynomial function 

An even polynomial detection function means t{s) = KkHk{s), with k even and Kk = (probabilists' defi- 

nition). Then, t'{s) = KkkHk-i{s) and w{s) ^ k^KkkHk-i{s) — Kk-iHk~i{s). Therefore, Es{w{s)t' {s)t{s)} ~ 
K^Kk~ikEs{Hk{s)Hk^i{s)^}. A known formula of the square of Hermite polynomials is the following one: 

i=0 

The orthogonality of the Hermite polynomial family allows us to conclude that: 



Es{w{s)t'{s)t{s)} = 4.,_ifc {tjl,)\k/2 - l)!fc! = (fc^^L\),}l/^2!)^ - ^96) 

The application of the Stirling approximation, when k is large, gives E5{w(s)t'(s)i(s)} « y/2/e{2'K)^^/'^2^''^'^k^^^'^. 
The derivation of the second order term is tackled in the following section. 

B. Second order term 

In a similar way, we have: 



, 2 



^i^rt'^i^) - [J^ eiH,,^2-2d^) ) , (97) 



(fc 

whose expectation, thanks to the orthogonality feature, simplifies to: 



, k-l 



Es{w{s)H'^s)} = ——^ J2 ^!'(2fc - 2 - 2£)l (98) 

The second term is slightly different: 

w{.s)H"{.s)t{s) = ,S!~!!, gfc-iG'^)'gfc(g)gfc-2(g), (99) 

/k-l \ /fe-2 \ 

^ ilH2k-2-2i{s) (') ^^■H2k-2-2d^) , (100) 

\e=o ) Vfco / 

- ('x:(r^)'^!H2™(.s)) fE^^r^)'^!i^2.-2-2.(.)) dOD 
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whose expectation is 

Es{w{s)H"ms)} = (^1 _ (^^-1)4 ^P(2A. _ 2 - 2iy.. (102) 

C. F/naZ expression 

Withdrawing the square of ER{i(r)|7^i} = Vke + 0(6l2), we get: 

Var {t{r)\ni} = (103) 

Appendix III 

Efficacy of the extended sinusoidal family under AWGN attack 
We have Vt(r) = — \/2 sin(r^Ak)Ak. Therefore: 

Ez{Vt(r + z)} = -\/2(sin(r^Ak)Ez{cos(z'^Ak)} + cos(r'^Ak)Ez{sin(z^Ak)})Ak (105) 

The last term is null when the pdf of Z is odd (ie. pz(z) = Pz(^z)) because sin(z^Ak) is even. Thus, if 
Z - 7V(0, all), then Ez{Vt(r + z)} ^ h{\,a^)t{y), with 

h{l,a,) = Ez{cos(z^Ak)} (106) 

p 

= Ez{cos(^z,Afc,,)} (107) 

i=l 

P P 

= E2i{cos(ziAfc,i)}Ez{cos(^ ZiAfe,^)} - Ezi{sin(ziAfe,i)}Ez{sin(^ z^Xk^i)} (108) 

1=2 1=2 

P 

'i=2 

Repeating p — 1 times the last two lines, we finally get: 
Therefore: 77(1, tr^) = 77(1, 0)e-''(i'°)'"'. 
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